influential region
Reviews: Self-Critical Reasoning for Robust Visual Question Answering
Originality: The proposed work is inspired from an existing work โ HINT (Selvaraju et al., arXiv 2019) which also proposes a novel training objective to align gradient based model's importance for various object proposals in the image with the regions identified as important by humans. This paper improves upon HINT by โ 1) instead of training the model to align its gradient based importance with regions identified as important by humans, the paper trains the model to strengthen its importance for the most influential region -- proposal deemed as most important as per the model's gradients based importance among the set of regions identified as most important by humans, 2) in addition to using visual regions identified as important by humans, the paper also introduces using textual explanations provided by humans and training QA pairs to identify important image regions, 2) the paper proposes another term in the objective that criticizes incorrect predicted answers being more sensitive to the influential region than correct answers. Quality: The paper does a good job of evaluating the proposed approach on both the VQA-CP and VQA datasets. The evaluation of the ablations of the proposed approach and false sensitivity rate are also useful. Clarity: The paper is clear for the most part except the following โ 1. Currently, in order to understand how the gradients from the proposed training objectives are effecting the model's parameters, one needs to read the HINT paper.